Partially Recursively Induced Structured Moderation (PRISM) for modeling racial differences in endometrial cancer survival

Purpose Health disparities are driven by a complex interplay of determinants operating across multiple levels of influence. However, while recognized conceptually, much disparities research fails to capture this inherent complexity in study focus and/or design; little of such work accounts for the interplay across the multiple levels of influence from structural (contextual) to biological or clinical. We developed a novel modeling framework that addresses these challenges and provides new insights. Methods We used data from the Florida Cancer Data System on endometrial cancer patients and geocoded-derived social determinants of health to demonstrate the applicability of a new modeling paradigm we term PRISM regression. PRISM is a new highly interpretable tree-based modeling framework that allows for automatic discovery of potentially non-linear hierarchical interactions between health determinants at multiple levels and differences in survival outcomes between groups of interest, including through a new specific area-level disparity estimate (SPADE) incorporating these multilevel influences. Results PRISM demonstrates that hierarchical influences on racial disparity in endometrial cancer survival appear to be statistically relevant and that these better predict survival differences than only using individual level determinants. The interpretability of the models allows more careful inspection of the nature of these hierarchical effects on disparity. Additionally, SPADE estimates show distinct geographical patterns across census tracts in Florida. Conclusion PRISM can provide a powerful new modeling framework with which to better understand racial disparities in cancer survival.


Introduction
Cancer disparities are created and maintained through a complex interplay of factors on multiple levels of influence. Numerous models have conceptually organized the multiple levels of health determinants, both in the context of illness broadly [1][2][3] and along cancer continuum specifically [4][5][6]. For example, Lynch and Rebbeck's Multi-level Biological and Social Integrative Construct integrates macroenvironmental-(e.g. health policy, neighborhood exposures, and family structure), individual-(e.g. behavior and sociodemographic) and biological-levels (e.g. genome, tissue and cell features) to assess cancer etiology and outcomes [6]. Understanding factors on each of these levels as well as the interaction between them is essential for comprehensively characterizing the etiology of cancer health disparities across time and place. Although conceptual models acknowledging hierarchical social determinants of cancer outcomes are widely accepted, existing statistical models are limited in their capacity to capture the multi-level nature of such determinants and their complex interplay with more proximal prognostic factors [4,6].
For instance, although overall incidence of endometrial cancer (EC) is highest among White women, Black women suffer higher mortality and worse survival from EC than do their White counterparts [7][8][9]. Black women have more high-RISK tumors (which encompasses both type and grade) and more late stage disease, regardless of type [10]. Although diagnosis with Type II EC is likely partially a function of genetic predisposition [11], racial survival disparities persist after controlling for clinical disease characteristics like grade and histology [12]. Poor EC survival outcomes in Black women thus reflect more than the racial patterning of prognostic factors associated with disease biology; they likely also reflect the social-environmental context in which disease occurs, and how that context interacts with individual-level disease features. Existing studies that attempt to account for the contribution of social determinants to EC survival disparities typically do so by controlling for individual indicators, like patient socioeconomic status (SES) or health insurance or aggressive histology, in traditional Cox proportional hazards models, without integrating the interaction between multiple levels of factors likely driving these disparities [8,13,14]. In contrast, tree-based statistical methods can capture hierarchies between health determinants as well as interactions across levels, without relying on linear methods, which may not reflect the true relationships between them. However, to our knowledge, these interactions have not been articulated methodologically.
In this paper, we present a novel statistical approach that aims to accomplish this within the context of cancer health disparities, specifically those related to EC survival. Our tree-based approach, termed PRISM, will help identify population sub-groups at greatest risk of contributing to excess disease burden by modeling the hierarchical effects of neighborhood-level determinants of EC survival and their complex interactions with individual-level disease features. In turn, this will provide a more precise characterization of disparities and strategies for reduction.

Ethics statement
All data are fully anonymized in the Florida Cancer Data System (FCDS) and thus were fully anonymized before we accessed them.
This study utilizes data from the Florida Cancer Data System (FCDS), a statewide incident cancer registry that provides ongoing surveillance of new cancer cases from diagnosis to death. In total, 320 hospitals report nearly 115,000 new cases annually. The FCDS is designed to collect systematic data on the clinical attributes of disease. Our cohort consisted of female Non-Hispanic White (NHW) and Non-Hispanic Black (NHB) endometrial cancer cases in FCDS from 2005 to 2014. Our primary outcome of interest was overall survival measured in days from date of diagnosis. We included the following individual-level characteristics in these analyses based on data in the FCDS: (a)race; (b)marital status; (c)insurance type; (d)age at diagnosis; (e)histologic type; (f)grade; (g)stage; and (h)course of treatment.
Race was operationalized as an indicator variable corresponding to identifying as non-Hispanic Black (NHB). We categorized the patient's marital status at the time of primary diagnosis into one of five categories: married, separated, single, unmarried or domestic partner, and widowed. Insurance type consists of the patient's primary method of payment or insurance coverage at the time of initial diagnosis and/or treatment, categorized as Private, Medicaid, Medicare, Military/VA/Tricare/HIS, Uninsured, or Insured, type not specified. Age at diagnosis was considered as continuous variable measured in years. Histologic type identifies the microscopic anatomy of cells, is a basis for staging and the determination of treatment options, and affects the prognosis and course of the disease. We included four categories for these analyses: carcinosarcoma, endometrioid adenocarcinoma, uterine serous carcinoma, and undifferentiated endometrial cancer. The grade of the tumor describes the resemblance of the tumor to normal tissue. Cancers were coded as grade 1 through 4, where higher grades reflect less cell differentiation. Cancers were classified as stage1-4 at diagnosis as defined in FCDS by the American Joint Committee on Cancer Staging Manual, 6th edition [15]. Furthermore, rather than using a hybridized variable defined by combining stage and grade (to address redundancy), we used stricter histology and grade criteria for inclusion. For example, we excluded non-endometrioid histologies with either missing or misclassified/inconsistent grade codes (e.g. coded 1 or 2), and defined endometrioid using only three histologies [16]. Three binary indicators for radiation, chemotherapy, and surgery reflect whether the treatment was delivered as part of a patient's first-course therapy. Receipt of surgery was defined as receipt of hysterectomy (partial or complete).
In recent years, geocodes for patients' census tract at diagnosis have been included which enables characterization of neighborhoods by key social determinants. In these analyses, we considered the following census-tract-level variables that were extracted from American Community Survey (ACS) based on their potential for mediating, moderating, driving, and/or confounding racial disparities in cancer: (a) median household income, (b) GINI coefficient (a measure of income inequality), (c) percent of individuals living below the poverty line, (d) percent of individuals 16+ in civilian labor force who are unemployed, (e) percent of adults 25 + with less than a high school education, (f) percent of housing units with more than 1 resident per room, (g) percent of housing units with no access to a vehicle, and (h) percent of housing units that are renter-occupied. Our goal was to understand racial differences in cancer-free survival and the moderating effects that might play a role in these differences. For simple descriptive analyses to compare the predictors by race, we used linear regression for continuous variables, multinomial regression for categorical variables, and ordinal regression for ordinal variables. We then developed a new modeling framework described below for understanding the multilevel moderation of the effect of race by individual level and contextual level variables discussed above.

Model development
In order to fit hierarchical interactions between individual level variables and contextual ones, we first introduce a generalized surface varying coefficient model. Let T be the logarithm of the failure time and x = (x 1 , . . ., x p ) 0 be a p-dimensional covariate vector of focus variables and z a K-dimensional vector of individual level variables. When T is subject to right censoring, we observe (y, δ, x, z) with y = min(T, C), where C is the logarithm of the censoring time and δ = 1 {T � C} is the censoring indicator. We assume that a random sample {y im , δ im , x i,m , z im }; i = 1, . . ., n m , m = 1, . . ., M where M represents the total number of tracts and n m the number of observations in tract m from the parent distribution has been collected. We will assume the relationship between y and x and z follows the model, where β(z) 0 = (β 1 (z), . . ., β p (z)). The e are errors with an unknown distribution. Here the individual level variables z are seen to be moderating the effect of x on y. The true underlying form for β j (z); j = 1, . . ., p is an unknown complex p-dimensional hypersurface which modulates the effect of each x j . We call this a surface-varying coefficient model. This is a multivariate analog of the varying coefficient models introduced by [17].

PRISM approximation and recursive partitioning
Estimating the model above directly proves challenging in higher dimensions and thus our structured approximation takes the form for each β j (z) as, Where A l is a partition of the predictor space determined by z. All A l are assumed mutually exclusive. This model which we term the Partially Recursively Induced Structured Moderation (PRISM) regression approximation to (1) defined as, where θ jl are the partition (node)-specific regression parameter estimates. So this approximation fits separate linear models with focus variables x to observations in each A l whose membership in which are determined by individual level variables z. This is done using a weighted least squares approach to account for censoring. For each A l , letF nl be the Kaplan-Meier estimator of the distribution function F of T for observations in A l . Following [18] and Stute and Wang (1993), we can writeF nl ¼ P i2A l d ni ðy ðiÞ � yÞ where the d ni are the Kaplan-Meier weights representing the jump points in the Kaplan-Meier estimator. Specifically, d n1 = δ (1) /n and Contextual level variables w can be further incorporated into the structured moderation model as β j (z(w)). This allows for multilevel moderation of the effect of x on y by now both z and w and we term this model hierarchical PRISM or HPRISM. We use a tree-structured regression approach to fit the model [19]. As is well known, there are many advantages of this approach over traditional multilevel models based on linear (mixed) models. These tree-based models generally good predictors for complex data. They are non-linear, non-parametric, resistant to outliers and missing values and because of their piecewise structure, provide ease of interpretation. In addition, they naturally find complex interactions which may not be known apriori. When considering multilevel moderation of the effect of x, these can also be used as exploratory tools from which post-specified linear mixed models can be fit.
In order to generate an empirical fit HPRISM models, using our sample {y i,m , δ im , x im , z im , w m }; i = 1, . . ., n m , m = 1, . . ., M where M, we use a recursive partitioning approach as follows: For ease of discussion, take for example p = 1, x = race, z 1 , . . ., z K and a single contextual level variable w = percentpov (i.e. a census tract level social determinant) and n ¼ P M m¼1 n m m observations. Consider z k and all observed values z k1 , . . ., z kn . Index the root node as τ that pools all of the observations. That is, (x i , y i , w i ), i ¼ 1; . . . ; P M m¼1 n m m, and with some abuse of notation, w 0 ¼ fðw 1 Þ n 1 ; . . . ; ðw M Þ n M g. Consider a split z ki < s that generates daughter nodes τ L and τ R . Then define the change in residual sum of squares (RSS) for this split as, is estimated as the mean value of all residuals of uncensored observations greater than r i [20].
Then choose the split value among all z k 's that maximizes the ΔRSS(s). Recurse till a stopping rule is satisfied (see below). This can be visualized as a binary tree consisting of a set of terminal nodes whose corresponding branches are the recursively applied splitting rule from the root node onwards.
Other potential splitting rules. Log-rank test statistic is a widely used splitting criteria. To accommodate log-rank splitting in our scenario, we can calculate the log-rank test statistic for both daughter nodes with respect to the focus variable (use focus variable to group observations in each daughter node). Then we take the absolute value of the difference between the log-rank test for two daughter nodes and use this value as the splitting criteria. The best split is determined by searching for the splitting variable z k and s that maximize the absolute difference. Besides, other statistics such as Uno's C [21] and Harrell's C [22] can be used to form splitting criteria as well.
Exclusion criteria. Since all censored observations will get zero weights, only the uncensored observations need to be considered. For each potential split, there should be uncensored observations in both right and left daughter nodes representing the different racial groups, and within each racial group, the uncensored observations should not come from the same census tract. Thus, this will allow estimation of both the main effect of race and interaction between race and the tract level social determinant post-splitting.

Stopping rule and estimating tree size
A variety of stopping rules or tree size estimation strategies can be borrowed from the CART literature. For instance, one can stop splitting when the number of observations in each daughter node being larger than some minimal number. One can also impose a restriction that splitting only be allowed when the overall tree goodness of fit criterion improves by some minimal margin. The C p method (reference) is such a rule where typical values of the complexity parameter C p are set at 0.01 thus indicating that any split that does not decrease the overall lack of fit by this factor will not be attempted. The optimal choice of C p is open to scrutiny but techniques like cross-validation could be used to estimate this. Finally, tree pruning where a generously grown tree is pruned backwards via weakest link cost complexity pruning can be implemented. This also requires something like cross-validation and can be computationally prohibitive. In our case, given the more complex nature of the tree-growing procedure, we chose to set a user-defined value of C p at 0.01 (as is the default say in the rpart R package).

Evaluating predictive performance
All PRISM and HPRISM models were evaluated for their predictive performance. The FCDS cohort was stratified by censoring status and then split by strata into an 80% training set and a 20% test set. Models were built on the training set and then test observations were fed down each tree based on their x and w values, and a predicted value of the log survival time was estimated from the terminal nodes that each test observation fell into. This process was repeated 100 times and the mean empirical test set Harrell's C statistic [22] standard error (SE) were reported. Harrell's C statistic is a widely accepted measure of predictive performance based on validation data that may be subject to right censoring. C statistics are routinely used in the medical literature to quantify the capacity of an estimated risk score in discriminating among subjects with different event times. It provides a global assessment of a fitted survival model rather than focusing on the prediction for a fixed time.

Local variable importance (Lvimp) and interpreting the trees
As will be shown, the tree models can be quite complex and challenging to interpret directly. This is because it's natural to focus on the overall tree topologies rather than the terminal nodes themselves which represent the fitted models. Here we will develop customized strategies for better interpretation. Variable importance was originally designed for tree-based models using measures involving surrogate variables [19]. Other measures based on mean overall improvement in node impurity for a tree have also been proposed. One interesting such measure is called variable importance (vimp) [23] which uses a prediction error approach involving "noising-up" a variable at a time and examining the difference in prediction error when a variable is noised-up by permuting its value randomly, compared to prediction error under the original predictor. Variables with large vimp values are ranked highly in terms of variable importance. In our setting however, we want to understand the individual level variables that are driving each terminal node-so a local vimp (Lvimp). In order to do this, we must condition on the observations in a terminal node while we apply the noising-up procedure variable by variable and re-building the entire tree. Differences in (weighted) prediction errors are evaluated only using those observations in the terminal node of interest.
Beyond the obvious usefulness of the Lvimp measures in terms of understanding the variables most important in determining a particular subgroup (i.e. terminal node), the Lvimp values can also be used to better visualize the relative "distance" between terminal nodes in a tree. This can be done using an old graphical tool for visualizing multivariate data called the Andrews curve [24]. Generically, Andrews curve start with a vector c which is a high dimensional datapoint where c = {c 1 , c 2 , . . ., c d } in R d . We can then define the finite Fourier series: By substituting in the rank-orderd Lvimp measures for each terminal node for c, we can generate an Andrews curve for each splitting path (branch) which defines a terminal node.

Specific area-level disparity estimation (SPADE)
Understanding how disparity is distributed and (hierarchically) moderated within a geographical area at the tract level is of great interest. Direct estimation at the tract level is not possible because all individuals in a tract share the same value of the social determinant. However, the HPRISM model allows such an estimate to be reverse engineered. Notice that within each terminal node, we have estimated y 0 l ¼ ðy 0l ; y 1l ; y 2l Þ locally and each terminal node consists of a mixture of observations from different census tracts (areas). Thus, for x = 0 if White and x = 1 if Black, the predicted mean log survival difference (i.e. disparity) for observations in a termi- To go back to the tract level itself, gather all of the observations from tract m from each terminal node and form the weighted averaged m ¼ P l Z lm d lm . The weights η lm can be made flexible but typically correspond to the relative proportions of individuals from a given tract in a terminal node. We call this the specific area-level disparity estimate or SPADE. We can then plot these as a heat map by census tract over the state of Florida. In order to combine all individual SPADEs together, we generated a composite SPADE. This will give an indication of the effect of considering all social determinants together. The simplest composite estimate simply is to sum the individual SPADEs. This can be shown to yield a somewhat crude but still useful upper bound to the true composite quantity.

Results
There were 13,506 cases in this derived cohort of which over 11% (1558) were NHB. There are significant differences between racial groups with respect to age, Medicaid and not insured status, frequency of undergoing a hysterectomy, frequency of receiving chemotherapy, marital status in a number of categories, frequency of histology revealing endometrioid adenocarcinoma, stage and grade of tumor. Table 1 shows overall differences between BNH and WNH for each individual level variable. All p-values were derived from a chi-squared test of homogeneity between races across levels of the variable except for Marital Status and Histology which required use of Fisher's Exact test due to small expected cell numbers. For Age which is a continuous variable in the modeling, we have displayed quartile groups in Table 1. All p-values are very small indicating strong racial differences. Tables 2 and 3 drill down further and explore racial differences for  overall survival. As expected, non-Hispanic Black women with endometrial cancer face poorer survival than non-Hispanic White women, reflected in a widening gap between the corresponding curves (p < 0.0001). Fig 1 illustrates the multilevel nature of interactions between race and individual level and area level variables. Kaplan-Meier curves broken out by age-specific racial groups are paneled by ranges of census tract level median income levels. Since the difference between these curves across panels is not constant, this reflects a multilevel interaction. Similar discoveries are made with other social determinants (not shown). Figs 2 and 3 show the PRISM and HPRISM median income fits respectively (as depicted by tree-like topological graphs) to our cohort. Each node is split at a value of one of the individual level variables and observations are sent to the left or right resulting daughter nodes (colored blue) according to whether they affirmatively obey the split rule or not and this process is recursively repeated until the stopping rule is invoked, at which point an observation reaches a terminal node (colored pink). Within each node is printed the mean log survival time, the intercept, race effect and additionally the interaction of race with tract median income (for the HPRISM tree). Each terminal node represents a discovered subgroup where the racial effect on survival is moderated significantly differently from the root node. These do not represent the same simple subgroups shown in Fig 1; Fig 1 is only a descriptive plot that says that hierarchical moderation may be occurring but does not discover the complex underlying nature that might fully be at play.
When examining the trees more carefully, it is quite clear that Stage and Age at Diagnosis are potentially important variables for the HPRISM tree and marital status, type of insurance coverage and surgery additionally show up for the PRISM tree. We will later show how Lvimp measures will help with interpreting the tree branch paths and formally allow us to rank order One thing that is noticeable is that some of the HPRISM trees appear topologically simpler than the PRISM tree. It turns out that there is a combined mathematical and operational reason for this when it happens. A sketch goes as follows: call the HPRISM tree T H and PRISM tree T P . Then, we know that in the root node τ root , because HPRISM model includes extra interaction term with w. This is the case at each step of splitting. In addition, the total number of split points for a given splitting variable is also smaller for HPRISM due to the split point exclusion criteria. These two facts taken together, imply that the number of splits possible before stopping can likely be less for HPRISM than PRISM. It is customary to expect p-values for the node-specific interaction estimates as a measure of their significance. However, it is well known that these p-values are overly optimistic (small) because they do not account for the amount of fitting done in building the tree. Correcting for such optimism is not easy and relies on approximate distribution theory based on estimates of model complexity. Another option would be to try and use a bootstrap-based approach as was done for the case of phylogenetic tree analysis [25,26]. Adequacy of model using estimated test set predictive fit was done via Harrell's c statistic as described in the Methods section. Table 4 shows Harrell C comparisons of the PRISM model versus the various HPRISM models along with accompanying standard error estimates. Increases in empirical test set Harrell C statistic were seen for all HPRISM models over the PRISM model. This provides unbiased validation of the fact that multilevel determinants at the individual and contextual level are indeed important in explaining racial disparity in endometrial cancer survival in the FCDS cohort. When specifically comparing GINI and MedIncome for instance, GINI provides a slightly more predictive information than MedIncome does with Harrell C values of 0.0.531 and 0.576 respectively. Zhu and colleagues [27] used nomograms for predicting cancer-specific and overall survival among patients with EC. They specifically  found a predicted C index of 0.782 (95% CI (0.772,0.792)) for overall survival derived from a Cox analysis. Miller and colleagues [28] studied EC recurrence prediction and when using a University of Iowa cohort found very predictive models using both clinical data and also when incorporating genomic predictors (AUC >0.90). However, when they took their models to TCGA data, the AUCs dropped significantly (AUC � 0.60-0.66). Madison and colleagues [13] using SEER data, found that being black, that increased age, aggressive histology and poor tumor grade and advanced stage of disease was associated with increased risk of death. Tejerizo-Garcia and colleagues [29] looked at prognostic factors of overall survival and disease-free survival in a cohort of 276 older patients in Spain. They found that FIGO stage and tumor grade were independent prognostic factors of overall survival in EC patients. More can be learned by a deeper examination of the HPRISM tree topologies. Each branch path to a terminal node for an HPRISM tree explicitly details a discovered hierarchical interaction. Fig 4 shows another way to display such interactions. For illustration, plotted are the differences in log survival time between WNH and BNH versus GINI and MedIncome respectively. Each dotted line represents a different terminal node from the HPRISM tree. Conditioning on a particular value of the social determinant variable, we see that the data points on the line do not coincide. This indicates the first layer of interaction (i.e. the interaction of individual level variables defining the terminal nodes with race). The fact the lines are not parallel across different values of GINI is indicative of the hierarchical interaction of race with the individual level variables that define the terminal nodes with the social determinant. Notice though that for MedIncome, the lines are nearly parallel indicating a much weaker hierarchical moderation effect. It then is reasonable to ask why the HPRISM MedIncome tree is topologically different than the PRISM tree. The differences can in part be explained by the different main effect estimates for race high up in the trees which can produce different choices of splitting variables. This initial difference is propagated down the trees. Hierarchical interaction plots for all other HPRISM trees are available in the S1 File. Tables 5 and 6 reports Lvimp ranks for the PRISM tree for each of the MedIncome, GINI HPRISM trees respectively. The rows of the Table give the name of the individual level variable and the columns index terminal node numbers. Table entries show the Lvimp values and their ranks in parentheses. Age appears to be the overwhelmingly most important variable describing BNH-WNH disparity in most terminal nodes for all HPRISM trees, followed by Stage for the MedIncome tree and Stage, Grade and Marital status, Surgery and Insurance for the GINI tree. Note the Lvimp values less than or equal to zero indicate variables that are not important in that branch. Lvimp tables for all other HPRISM trees are available in the S1 File. These Lvimp tables illustrate the moderation effect on an individual-level variable as it's clear how the rankings across terminal for each variable can change with and without contextual moderation. Fig 5 plots Andrews curves for the PRISM and HPRISM GINI and MedIncome trees to visualize how different or similar the terminal nodes are with respect to racial disparity. Each line represents a unique observation and all observations of the same color are within the same terminal node. Notice how the PRISM tree is less able to find distinct subgroups as either of the HPRISM trees. In particular, the MedIncome HPRISM tree found wider differences across terminal nodes (i.e. different disparity subgroups) than the GINI tree did. Andrews plots for all other HPRISM trees are available in the S1 File ( S3 Fig). Using the GINI HPRISM tree as an example, S4 and S5 Figs (S1 File) show the raw tract level MedIncome values for observations grouped by the HPRISM MedIncome tree terminal nodes (the node numbering system is not important here). Different colors correspond to different terminal nodes and the shading of the color indicates the magnitude of the tract level MedIncome value. All other such raw tract value by tree node plots are available in the S1 File. These plots clearly demonstrate that each terminal node consists of observations from a mixture of tracts and that the number of tracts represented in each terminal node can vary greatly. This feature paves the way for SPADE calculations at each tract based on the methods previously described. S6 Fig (S1 File) shows heat maps of our new SPADEs for tracts across Florida as derived from the PRISM and HPRISM GINI and MedIncome trees. We have plotted tract level z-scores for the estimated weighted ratio in survival times between WNH to BNH for Table 5. Local variable importance measure for the PRISM tree. T1, . . ., T12 indicate terminal node number. T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T11  T12 Prism.Insurance 0.0811 (5)  each heat map. With the various HPRISM heat maps, we had to use a robust z-score instead of the usual z-score since the distributions of tract level disparity estimates were heavy-tailed and heavily skewed. This standardization makes the tract SPADE estimates more comparable across heat maps. While these are derived area level measures, they come from models that account for both individual level and area level moderation of racial differences in survival. There are clear differences between the PRISM and HPRISM plots. The PRISM heat map is heavily concentrated around the middle of the disparity distribution (i.e. purple colored) whereas the HPRISM ones show much more diversity in colors. These are striking indications of hierarchical moderation of race (and by extension, disparity), by tract level variables. Some patterns become apparent. Once again, using MedIncome as a focus, overlaying MedIncome and comparing S4 and S5 Figs in S1 File, regions where MedIncome is high tend to exacerbate WNH:BNH racial disparities with WNH surviving longer on average than BNH as compared to the PRISM SPADE estimates. This effect is most noticeable in large metropolitan regions like Jacksonville, South Florida, and the Tampa/Orlando area. The opposite effect happens for  T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T11  T12  T13 MedIncome. Insurance -0.0006 (7) -0.0029 (6) 0.0000 (2) 0.0000 (2) 0.0008 (4) 0.0000 (3) MedIncome. Surgery -0.0003 (6) 0.0000 (1) 0.0000 (2) 0.0000 (2) 0.0000 (5) 0.0000 GINI.Radiation 0.0000 (2) 0.0000 (1) 0.0000 (9) 0.0000 (2) 0.0000 (6) 0.0000 (3) 0.0000 (8) 0.0000 (3) 0.0000 (9) 0.0000 (7) 0.0000 (7) 0.0000 (9) 0.0000

Tree. Variable
GINI.Chemo 0.0000 -0.0530 (4) 0.0206 (7) 0.0000 (2) 0.0000 (6) 0.0000 (3) 0.0000 (8) 0.0813 (2) 0.0373 (7) 0.0000 (7) 0.0159 There are some places where the direct estimate and the SPADE estimate agree and some where disagreements happen. The direct estimates in these tracts tend to have small sample sizes hence these are highly variable and less trustworthy. Importantly, the other thing that is clearly notable when comparing these figures is that there is clear borrowing information happening across tracts in the SPADE estimates which is why so many more tracts have estimates in them. This is actually akin to techniques used in small area estimation [30].

Discussion
While racial disparities in endometrial cancer survival have previously been studied, the understanding of underlying moderators has been limited by existing statistical methods. Using a new method, this study estimated the moderation of racial disparities in endometrial cancer survival, integrating the hierarchical moderation effects of individual/clinical and social determinants of health in a way that is faithful to the underlying social ecological framework for understanding disparities. Ecological models have been used to study disparities before, but the accompanying quantitative approaches did not allow hierarchical interactions or have not been pursued due to sample size issues [4].
In this study, we specifically made the following important methodological contributions to address these limitations: i) we developed a non-parametric modeling framework for structured moderation (PRISM) which because of the tree-like structure, also yields interpretable models of moderation; ii) this modeling framework allowed for the identification of disparity sub-groups without their apriori specification thereby allowing for the possible lack of homogeneity in disparity in the training dataset; iii) we developed a new measure of local variable importance within the context of the PRISM model; iv) we demonstrated some new uses of data visualization tools to extract interpretable structure from the model and; v) we developed a new estimator (the SPADE) for examining the geographical variation of the model estimated disparities, employing a new type of borrowing strength across census tracts by grouping patients with similar hierarchical moderation effects of race differences as determined by their individual-level determinant variables, allowing more reliable estimates in sparsely populated tracts. In doing so, it's also capturing variation across grouped tracts, which may likely mimic unmeasured variation within tracts due to using average contextual level information. To the best of our knowledge, no such estimator has been previously developed. We have previously identified a generalization to unit level small area estimation (SAE) [30] using trees [31]. However, much of this work did not consider hierarchical interactions. McConville and Toth developed a tree-based approach for automated selection of post strata when estimating finite population totals but did not deal with small area models [32].
Despite its strengths, there are several important limitations of our study. First, using cancer registry data imposes methodological challenges, including multiple cancer diagnoses, duplicate reports, reporting delays, and misclassification of race [33]. While these can negatively impact model fit and predictive performance, we used a rigorous, unbiased predictive measure and our models retained robust predictive performance. Second, our models do not assess (moderated) race as a causal factor, but as a predictor. There is existing work on causal inference trees, but they have not been generalized to handle the hierarchical moderation required in the social ecological framework [34]. Third, we have based our analysis around a hierarchical moderation framework motivated by sociological ecological theory. While our new models demonstrate good fit to the data, it is possible that other competing models that assume a different structured approximation might also fit the data well. A deeper comparison of different model formulations is forthcoming. Fourth, the binary framework used for surgery, chemotherapy, and radiation variables. Those are primary drivers of survival, based on what type and stage of EC thus making it more complicated to interpret the benefit a more robust modeling approach. Another limitation is the highly selective inclusion criteria. Great care went into eliminating any combination of grade and histology codes that were contradictory (e.g. carcinosarcoma histology and grade 1 nuclear features) and those for which the site of tumor origin was unclear (e.g. adenocarcinoma) or incorrect (e.g. squamous cell carcinoma). While such selectivity likely decreased the sample size and thus potentially the power to identify definitive associations between groups, it was felt that the risk of misclassification bias was too large if a general, unfiltered data set was analyzed. Finally, while the role of genetics is not being modeled here, these determinants may be important moderators given that ecological factors may contribute to epigenetic differences across racial groups.
While the identification and understanding of disparities was our primary goal, our results may inform multilevel interventions that may be used to attenuate disparities and potentially predict effectiveness across neighborhoods [35]. By understanding how area-level factors (often influenced by policy) interact with individual-level factors to determine disparities in endometrial cancer survival, and the fact that these interactions result in pockets of more extreme disparity spatially oriented across the state, one could conceivably design targeted multilevel intervention strategies that could be more effective in reducing disparity.